Skip to content

feat(sinker): Ship 8 — Yuv444p RGBA via const-ALPHA template#19

Merged
uqio merged 2 commits intomainfrom
feat/ship8-rgba-yuv444p
Apr 26, 2026
Merged

feat(sinker): Ship 8 — Yuv444p RGBA via const-ALPHA template#19
uqio merged 2 commits intomainfrom
feat/ship8-rgba-yuv444p

Conversation

@al8n
Copy link
Copy Markdown
Collaborator

@al8n al8n commented Apr 26, 2026

Tranche 4a of Ship 8 sink-side RGBA. Refactors the Yuv444p planar 4:4:4 kernel family across all 6 backends (scalar + NEON + SSE4.1 + AVX2 + AVX-512 + wasm simd128) using the const-generic-ALPHA template established by PR #16 (Yuv420p) and extended in PR #17 (NV12/NV21).

Tranche 4 was split into three sub-PRs because each format family in it has a different shape:

Scope

Sink-side only, default opaque alpha (0xFF). Per the tracker in docs/color-conversion-functions.md § Ship 8:

# Tranche Formats Status
1 4:2:0 planar Yuv420p ✅ shipped (PR #16)
2 4:2:0 semi-planar Nv12, Nv21 ✅ shipped (PR #17)
3 4:2:2 planar + semi-planar Yuv422p, Nv16 ✅ shipped (PR #18)
4a 4:4:4 planar Yuv444p this PR — kernel refactor across all 6 backends
4b 4:4:4 semi-planar Nv24, Nv42 next — shared <SWAP_UV, ALPHA> template (mirrors PR #17 NV12/NV21)
4c 4:4:0 planar Yuv440p wiring-only after 4a (reuses yuv_444_to_rgba_row)
5 High-bit-depth 4:2:0 Yuv420p9/10/12/14/16, P010/P012/P016
6 High-bit-depth 4:2:2 Yuv422p9/10/12/14/16, Yuv440p10/12, P210/P212/P216
7 High-bit-depth 4:4:4 Yuv444p9/10/12/14/16, P410/P412/P416

Usage:

use colconv::{
    frame::Yuv444pFrame,
    sinker::MixedSinker,
    yuv::{Yuv444p, yuv444p_to},
    ColorMatrix,
};

let frame = Yuv444pFrame::new(&y_plane, &u_plane, &v_plane, w, h, w, w, w);
let mut rgba = vec![0u8; (w * h * 4) as usize];
let mut sinker = MixedSinker::<Yuv444p>::new(w as usize, h as usize)
    .with_rgba(&mut rgba)?;

yuv444p_to(&frame, /*full_range=*/ true, ColorMatrix::Bt709, &mut sinker)?;
// rgba[4*i..4*i+3] = R,G,B; rgba[4*i+3] = 0xFF.

What's in this PR

Public API

  • MixedSinker<Yuv444p>::with_rgba(&mut [u8]) / set_rgba — format-specific impl block.
  • row::yuv_444_to_rgba_row(...) — public dispatcher paralleling the RGB variant.

Kernel work

File What's added
row/scalar.rs yuv_444_to_rgba_row + shared yuv_444_to_rgb_or_rgba_row<const ALPHA: bool> template
arch/neon.rs Same shape; uses native vst4q_u8 when ALPHA = true, vst3q_u8 otherwise
arch/x86_sse41.rs Same shape; reuses write_rgba_16 from PR #16
arch/x86_avx2.rs Same shape; reuses write_rgba_32 from PR #16
arch/x86_avx512.rs Same shape; reuses write_rgba_64 from PR #16
arch/wasm_simd128.rs Same shape; reuses wasm write_rgba_16 from PR #16

The 4:4:4 kernel is structurally simpler than 4:2:0 — one UV pair per Y pixel, no chroma upsampling — so the const-generic-ALPHA refactor is mechanical: only the per-block store branches on ALPHA. Each kernel has 3 wrappers now (yuv_444_to_rgb_row, yuv_444_to_rgba_row, yuv_444_to_rgb_or_rgba_row) thinning to the same monomorphized template.

MixedSinker integration

RGBA runs as an independent kernel call (not compose) — same pattern as Yuv420p (PR #16) and NV12/NV21 (PR #17). Default alpha = 0xFF since Yuv444p has no alpha plane.

Doc updates

  • docs/color-conversion-functions.md § Ship 8 — split tranche 4 into 4a (this PR — Yuv444p), 4b (Nv24 / Nv42), 4c (Yuv440p wiring).
  • The compile_fail doctest negative example on MixedSinker::<Yuv420p>::with_rgba moved forward from Yuv444p to Nv24 (next not-yet-wired format).

Tests

+6 lib tests, total 459 (was 453):

Layer Tests added
Format-level Yuv444p 4: gray-to-gray + opaque alpha, RGB-byte invariant, buffer-too-short, random-YUV SIMD parity (1922×4 frame, all 4 matrices × both ranges)
NEON per-backend (verified locally) 2: 16-pixel all-matrices, varied widths (1, 3, 15, 17, 32, 33, 1920, 1921 — including odd widths to validate the 4:4:4 no-parity contract)
SSE4.1 per-backend (CI) 2: same shape
AVX2 per-backend (CI) 2: 32-pixel main loop + tail widths
AVX-512 per-backend (CI) 2: 64-pixel main loop + tail widths
wasm simd128 per-backend (CI) 2: 16-pixel + tail widths

Per-backend tests bypass the dispatcher (call each backend's unsafe yuv_444_to_rgba_row directly under runtime feature detection) so on AVX-512-capable CI runners all three x86 paths run; the existing CI matrix (avx512 SDE + AVX2-max + SSE4.1-max + scalar tarpaulin tiers) covers every backend.

Local results (aarch64 macOS): 459 lib tests + 1 doctest pass; wasm32 + x86_64 cross-targets compile clean.

What's deferred

  • Tranche 4bNv24 + Nv42 semi-planar 4:4:4 — next PR. Same dual-const-generic shape as PR feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template #17 (NV12/NV21).
  • Tranche 4cYuv440p — wiring-only PR after 4a, reuses this PR's yuv_444_to_rgba_row (4:4:0 = 4:4:4 with half-height chroma).
  • Tranches 5–7 — high-bit-depth families.
  • with_rgba_u16 ships in tranches 5–7.
  • YUVA source frames (Ship 8b) — independent follow-up.

Test plan

  • CI green on test, test-sde-avx512, cross, coverage, clippy, build, miri-* jobs.
  • Per-tier coverage matrix exercises SSE4.1 / AVX2 / scalar paths via existing colconv_disable_* rustflags.
  • Verify Yuv444p → RGBA pipeline end-to-end with a real frame (gray + non-gray patches).
  • cargo doc --lib --no-deps clean (no new doc warnings vs. main).

🤖 Generated with Claude Code

@al8n al8n requested a review from Copilot April 26, 2026 05:32
@al8n al8n changed the title update feat(sinker): Ship 8 — Yuv444p RGBA via const-ALPHA template Apr 26, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds native RGBA output support for YUV 4:4:4 planar (Yuv444p) across the MixedSinker path and row-kernel layer, including scalar and SIMD backends, plus tests to validate correctness and SIMD equivalence.

Changes:

  • Add yuv_444_to_rgba_row dispatcher and scalar/SIMD implementations (NEON, SSE4.1, AVX2, AVX-512, wasm simd128).
  • Extend MixedSinker<Yuv444p> with with_rgba/set_rgba and wire RGBA writing in PixelSink.
  • Add focused tests for Yuv444p RGBA output and SIMD-vs-scalar equivalence.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/sinker/mixed.rs Adds RGBA buffer attachment for Yuv444p and emits RGBA during sink conversion + tests.
src/row/mod.rs Exposes public yuv_444_to_rgba_row dispatcher with SIMD selection.
src/row/scalar.rs Adds scalar RGBA kernel and refactors RGB/RGBA into a shared const-generic implementation.
src/row/arch/x86_sse41.rs Adds SSE4.1 RGBA kernel and shared RGB/RGBA core + equivalence tests.
src/row/arch/x86_avx2.rs Adds AVX2 RGBA kernel and shared RGB/RGBA core + equivalence tests.
src/row/arch/x86_avx512.rs Adds AVX-512 RGBA kernel and shared RGB/RGBA core + equivalence tests.
src/row/arch/wasm_simd128.rs Adds wasm simd128 RGBA kernel and shared RGB/RGBA core + equivalence tests.
src/row/arch/neon.rs Adds NEON RGBA kernel and shared RGB/RGBA core + equivalence tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/row/scalar.rs
Comment on lines 420 to +424
debug_assert!(y.len() >= width, "y row too short");
debug_assert!(u.len() >= width, "u row too short");
debug_assert!(v.len() >= width, "v row too short");
debug_assert!(rgb_out.len() >= width * 3, "rgb_out row too short");
let bpp: usize = if ALPHA { 4 } else { 3 };
debug_assert!(out.len() >= width * bpp, "out row too short for {bpp}bpp");
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug_assert! message uses {bpp} formatting but does not pass bpp as an argument, which will fail to compile due to an unmatched named format parameter. Pass bpp to the macro (e.g., via positional or named formatting), or escape the braces if you intended them to be literal.

Copilot uses AI. Check for mistakes.
Comment thread src/sinker/mixed.rs
Comment on lines +1626 to +1648
// Native RGBA: independent kernel run, separate from RGB. Default
// alpha = 0xFF since Yuv444p has no alpha plane.
if let Some(buf) = rgba.as_deref_mut() {
let rgba_plane_end =
one_plane_end
.checked_mul(4)
.ok_or(MixedSinkerError::GeometryOverflow {
width: w,
height: h,
channels: 4,
})?;
let rgba_plane_start = one_plane_start * 4;
yuv_444_to_rgba_row(
row.y(),
row.u(),
row.v(),
&mut buf[rgba_plane_start..rgba_plane_end],
w,
row.matrix(),
row.full_range(),
use_simd,
);
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When callers request both RGB and RGBA outputs, this design runs two independent YUV→RGB conversions (one for RGBA here and another later for RGB/HSV), duplicating the core per-pixel math. Consider a combined path when both rgb and rgba are set (single conversion loop that stores into both buffers), or preferentially compute one format and derive the other with minimal overhead, to avoid doubling the hottest work.

Copilot uses AI. Check for mistakes.
Comment thread src/row/arch/neon.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 70.29703% with 60 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/row/arch/x86_avx512.rs 0.00% 34 Missing ⚠️
src/row/arch/neon.rs 80.55% 7 Missing ⚠️
src/row/mod.rs 72.00% 7 Missing ⚠️
src/row/arch/x86_avx2.rs 82.35% 6 Missing ⚠️
src/row/arch/x86_sse41.rs 82.35% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@uqio uqio merged commit 1fa7e95 into main Apr 26, 2026
57 of 58 checks passed
@uqio uqio deleted the feat/ship8-rgba-yuv444p branch April 26, 2026 05:52
uqio added a commit that referenced this pull request Apr 26, 2026
)

Tranche 4b of Ship 8 sink-side RGBA. Adds `Nv24` / `Nv42` (semi-planar 4:4:4) RGBA output via the dual-const-generic `<SWAP_UV, ALPHA>` template established by PR #17 (NV12 / NV21), and **retro-applies a Strategy A combined RGB→RGBA fan-out to all 8 wired families** so callers attaching both `with_rgb` and `with_rgba` no longer pay the per-pixel YUV→RGB math twice — addresses the Copilot review finding from PR #19 (`src/sinker/mixed.rs:1648`).
uqio added a commit that referenced this pull request Apr 26, 2026
)

Tranche 4c of Ship 8 sink-side RGBA. Wiring-only PR — adds `Yuv440p` (4:4:0 planar, 8-bit) RGBA output by reusing the `yuv_444_to_rgba_row` dispatcher that shipped in PR #19. **No new kernel code** anywhere in the crate; per-row math is identical to 4:4:4 (full-width chroma) — only the walker reads chroma row `r / 2`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants